AITopics | failure diagnosis

Collaborating Authors

failure diagnosis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols

Zeng, Xianchao, Zhou, Xinyu, Li, Youcheng, Shi, Jiayou, Li, Tianle, Chen, Liangming, Ren, Lei, Li, Yong-Lu

arXiv.org Artificial IntelligenceDec-4-2025

Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we introduce ViFailback, a framework designed to diagnose robotic manipulation failures and provide both textual and visual correction guidance. Our framework utilizes explicit visual symbols to enhance annotation efficiency. We further release the ViFailback dataset, a large-scale collection of 58,126 Visual Question Answering (VQA) pairs along with their corresponding 5,202 real-world manipulation trajectories. Based on the dataset, we establish ViFailback-Bench, a benchmark of 11 fine-grained VQA tasks designed to assess the failure diagnosis and correction abilities of Vision-Language Models (VLMs), featuring ViFailback-Bench Lite for closed-ended and ViFailback-Bench Hard for open-ended evaluation. To demonstrate the effectiveness of our framework, we built the ViFailback-8B VLM, which not only achieves significant overall performance improvement on ViFailback-Bench but also generates visual symbols for corrective action guidance. Finally, by integrating ViFailback-8B with a VLA model, we conduct real-world robotic experiments demonstrating its ability to assist the VLA model in recovering from failures. Project Website: https://x1nyuzhou.github.io/vifailback.github.io/

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.02787

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System

Sun, Yongqian, Pan, Xijie, Xiong, Xiao, Tao, Lei, Wang, Jiaju, Zhang, Shenglin, Yuan, Yuan, Li, Yuqi, Jian, Kunlin

arXiv.org Artificial IntelligenceSep-23-2025

Network failure diagnosis is challenging yet critical for high-performance computing (HPC) systems. Existing methods cannot be directly applied to HPC scenarios due to data heterogeneity and lack of accuracy. This paper proposes a novel framework, called ClusterRCA, to localize culprit nodes and determine failure types by leveraging multimodal data. ClusterRCA extracts features from topologically connected network interface controller (NIC) pairs to analyze the diverse, multimodal data in HPC systems. To accurately localize culprit nodes and determine failure types, ClusterRCA combines classifier-based and graph-based approaches. A failure graph is constructed based on the output of the state classifier, and then it performs a customized random walk on the graph to localize the root cause. Experiments on datasets collected by a top-tier global HPC device vendor show ClusterRCA achieves high accuracy in diagnosing network failure for HPC systems. ClusterRCA also maintains robust performance across different application scenarios.

data mining, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2506.20673

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.47)
Telecommunications (0.47)
Information Technology (0.46)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)
(2 more...)

Add feedback

TelOps: AI-driven Operations and Maintenance for Telecommunication Networks

Yang, Yuqian, Yang, Shusen, Zhao, Cong, Xu, Zongben

arXiv.org Artificial IntelligenceDec-5-2024

Telecommunication Networks (TNs) have become the most important infrastructure for data communications over the last century. Operations and maintenance (O&M) is extremely important to ensure the availability, effectiveness, and efficiency of TN communications. Different from the popular O&M technique for IT systems (e.g., the cloud), artificial intelligence for IT Operations (AIOps), O&M for TNs meets the following three fundamental challenges: topological dependence of network components, highly heterogeneous software, and restricted failure data. This article presents TelOps, the first AI-driven O&M framework for TNs, systematically enhanced with mechanism, data, and empirical knowledge. We provide a comprehensive comparison between TelOps and AIOps, and conduct a proof-of-concept case study on a typical O&M task (failure diagnosis) for a real industrial TN. As the first systematic AI-driven O&M framework for TNs, TelOps opens a new door to applying AI techniques to TN automation.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/MCOM.003.2300055

2412.04731

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Decentralized Failure Diagnosis of Stochastic Discrete Event Systems

Liu, Fuchun, Qiu, Daowen, Xing, Hongyan, Fan, Zhujun

arXiv.org Artificial IntelligenceOct-30-2006

Recently, the diagnosability of {\it stochastic discrete event systems} (SDESs) was investigated in the literature, and, the failure diagnosis considered was {\it centralized}. In this paper, we propose an approach to {\it decentralized} failure diagnosis of SDESs, where the stochastic system uses multiple local diagnosers to detect failures and each local diagnoser possesses its own information. In a way, the centralized failure diagnosis of SDESs can be viewed as a special case of the decentralized failure diagnosis presented in this paper with only one projection. The main contributions are as follows: (1) We formalize the notion of codiagnosability for stochastic automata, which means that a failure can be detected by at least one local stochastic diagnoser within a finite delay. (2) We construct a codiagnoser from a given stochastic automaton with multiple projections, and the codiagnoser associated with the local diagnosers is used to test codiagnosability condition of SDESs. (3) We deal with a number of basic properties of the codiagnoser. In particular, a necessary and sufficient condition for the codiagnosability of SDESs is presented. (4) We give a computing method in detail to check whether codiagnosability is violated. And (5) some examples are described to illustrate the applications of the codiagnosability and its computing method.

artificial intelligence, diagnosis, failure diagnosis, (12 more...)

arXiv.org Artificial Intelligence

cs/0610165

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)

Add feedback